feat: add filter methods to MongoDB DocumentStore #2474
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Related Issues
update_by_filter()anddelete_by_filter()operations toMongoDBDocumentStore#2331Proposed Changes:
Added filter-based bulk operations to
MongoDBAtlasDocumentStoreto support production RAG pipelines (parent issue #8508):delete_by_filter(filters): Deletes all documents matching the provided Haystack metadata filters using MongoDB'sdelete_many()API. Returns the count of deleted documents.update_by_filter(filters, meta): Updates metadata fields for all documents matching the provided filters using MongoDB'supdate_many()with$setoperator. Updates fields in themeta.{key}path since MongoDB stores documents withflatten=False. Returns the count of modified documents.Both methods include async versions (
delete_by_filter_asyncandupdate_by_filter_async) and use the existing_normalize_filters()function for consistent filter handling across the document store.How did you test it?
Added integration tests for both sync and async versions:
test_delete_by_filter: Verifies selective deletion based on metadata filterstest_update_by_filter: Verifies metadata updates for filtered documentstest_delete_by_filter_async: Async version of delete testtest_update_by_filter_async: Async version of update testAll tests validate:
Notes for the reviewer
time.sleep()needed unlike OpenSearchmeta.{key}path because MongoDB stores documents withflatten=False(line 737 in document_store.py)deleted_countandmodified_countattributes from operation resultsChecklist
fix:,feat:,build:,chore:,ci:,docs:,style:,refactor:,perf:,test:.